Variational Autoencoder

A generative model that learns a latent representation by combining an encoder $q(z|x)$ and a decoder $p(x|z)$ . Training maximizes the ELBO (Evidence Lower Bound):

$\mathcal{L} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{\mathrm{KL}}(q(z|x) \| p(z))$

The first term is reconstruction quality; the KL term regularizes the latent space toward the prior $p(z)$ (typically a standard Gaussian). In the rate-distortion view, the relevant rate term is the data-averaged KL, $\mathbb{E}_{p_{\mathrm{data}}(x)}[D_{\mathrm{KL}}(q(z|x)\|p(z))]$ .

Variations

$\beta$ -VAE

Weights the KL term by $\beta$ :

$\mathcal{L} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - \beta \, D_{\mathrm{KL}}(q(z|x) \| p(z))$

$\beta > 1$ tightens the latent bottleneck and typically increases disentanglement pressure, often at the cost of reconstruction quality. Disentangling means latent dimensions align more with independent factors of variation (e.g., color, shape, size) rather than mixing them.

This directly controls the Rate Distortion theory tradeoff: $\beta$ sets the compression-fidelity balance.

Variational Autoencoder

Variations

\beta-VAE

$\beta$ -VAE